Linear Bandits with Feature Feedback
نویسندگان
چکیده
منابع مشابه
Linear Contextual Bandits with Knapsacks
We consider the linear contextual bandit problem with resource consumption, in addition to reward generation. In each round, the outcome of pulling an arm is a reward as well as a vector of resource consumptions. The expected values of these outcomes depend linearly on the context of that arm. The budget/capacity constraints require that the total consumption doesn’t exceed the budget for each ...
متن کاملBandits with Delayed, Aggregated Anonymous Feedback
We study a variant of the stochastic K-armed bandit problem, which we call “bandits with delayed, aggregated anonymous feedback”. In this problem, when the player pulls an arm, a reward is generated, however it is not immediately observed. Instead, at the end of each round the player observes only the sum of a number of previously generated rewards which happen to arrive in the given round. The...
متن کاملOnline Learning with Feedback Graphs: Beyond Bandits
We study a general class of online learning problems where the feedback is specified by a graph. This class includes online prediction with expert advice and the multiarmed bandit problem, but also several learning problems where the online player does not necessarily observe his own loss. We analyze how the structure of the feedback graph controls the inherent difficulty of the induced T -roun...
متن کاملThreshold Bandits, With and Without Censored Feedback
We consider the Threshold Bandit setting, a variant of the classical multi-armed bandit problem in which the reward on each round depends on a piece of side information known as a threshold value. The learner selects one of K actions (arms), this action generates a random sample from a fixed distribution, and the action then receives a unit payoff in the event that this sample exceeds the thres...
متن کاملCombinatorial Multi-Armed Bandits with Filtered Feedback
Motivated by problems in search and detection we present a solution to a Combinatorial Multi-Armed Bandit (CMAB) problem with both heavy-tailed reward distributions and a new class of feedback, filtered semibandit feedback. In a CMAB problem an agent pulls a combination of arms from a set {1, ..., k} in each round, generating random outcomes from probability distributions associated with these ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the AAAI Conference on Artificial Intelligence
سال: 2020
ISSN: 2374-3468,2159-5399
DOI: 10.1609/aaai.v34i04.5980